66 research outputs found

    Profile Diversity for Phenotyping Data Search and Recommendation

    No full text
    Session: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score

    A two-head loss function for deep Average-K classification

    Full text link
    Average-K classification is an alternative to top-K classification in which the number of labels returned varies with the ambiguity of the input image but must average to K over all the samples. A simple method to solve this task is to threshold the softmax output of a model trained with the cross-entropy loss. This approach is theoretically proven to be asymptotically consistent, but it is not guaranteed to be optimal for a finite set of samples. In this paper, we propose a new loss function based on a multi-label classification head in addition to the classical softmax. This second head is trained using pseudo-labels generated by thresholding the softmax head while guaranteeing that K classes are returned on average. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes. Experiments on two datasets from the literature demonstrate that our approach outperforms the softmax baseline, as well as several other loss functions more generally designed for weakly supervised multi-label classification. The gains are larger the higher the uncertainty, especially for classes with few samples

    Profile Diversity for Phenotyping Data Search and Recommendation

    No full text
    Session: Applications innovantesSession: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score

    Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms - Best Method Of GeoLifeCLEF 2019 Challenge

    Get PDF
    International audienceThis technical report describes the model that achieved the best performance of the GeoLifeCLEF challenge, the objective of which was to evaluate methods for plant species prediction based on their geographical location. Our method is based on an adaptation of the Inception v3 architecture initially dedicated to the classification of RGB images. We modified the input layer of this architecture so as to process the spatialized environmental tensors as images with 77 distinct channels. Using this architecture, we did train several models that mainly differed in the used training data and in the predicted output classes. One of the main objective, in particular, was to compare the performance of a model trained with plant occurrences only to that obtained with a model trained on all available occurrences, including the species of other kingdoms. Our results show that the global model performs consistently better than the plant-specific model. This suggests that the convolutional neural network is able to capture some inter-dependencies among all species and that this information significantly improves the generalisation capacity of the model for any species

    PlantRT : a Distributed Recommendation Tool for Citizen Science

    Get PDF
    International audienceLes utilisateurs du Web 2.0 sont de gros producteurs de données diverses qu'ils stockent dans une grande variété de systèmes. Dans ce travail, nous nous concentrons sur le cas particulier des botanistes. En effet, établir une connaissance précise de l'identité, de la distribution géographique et de l'évolution des espèces vivantes est essentiel pour la pérennité de cette biodiversité, tout autant que pour l'espèce humaine. L'émergence des sciences citoyennes et des réseaux sociaux sont des outils supplémentaires favorisant la création de grandes communautés d'observateurs de la nature, qui ont commencé a produire d'énormes collections de données multimédias. Cependant, la complexité inhérente à la réalisation de ces collections provoque une certaine méfiance des utilisateurs, ces dernier ne souhaitant pas stocker leurs données sur un serveur central. Dans ce travail, nous avons réalisé un prototype multi-sites, où chaque site, peut représenter 1 à n utilisateurs permettant la recherche et la recommandation d'observations de plantes diversifiées à grand échelle

    Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms - Best Method Of GeoLifeCLEF 2019 Challenge

    Get PDF
    International audienceThis technical report describes the model that achieved the best performance of the GeoLifeCLEF challenge, the objective of which was to evaluate methods for plant species prediction based on their geographical location. Our method is based on an adaptation of the Inception v3 architecture initially dedicated to the classification of RGB images. We modified the input layer of this architecture so as to process the spatialized environmental tensors as images with 77 distinct channels. Using this architecture, we did train several models that mainly differed in the used training data and in the predicted output classes. One of the main objective, in particular, was to compare the performance of a model trained with plant occurrences only to that obtained with a model trained on all available occurrences, including the species of other kingdoms. Our results show that the global model performs consistently better than the plant-specific model. This suggests that the convolutional neural network is able to capture some inter-dependencies among all species and that this information significantly improves the generalisation capacity of the model for any species

    Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

    Get PDF
    International audienceIn modern classification tasks, the number of labels is getting larger and larger, as is the size of the datasets encountered in practice. As the number of classes increases, class ambiguity and class imbalance become more and more problematic to achieve high top-1 accuracy. Meanwhile, Top-K metrics (metrics allowing K guesses) have become popular, especially for performance reporting. Yet, proposing top-K losses tailored for deep learning remains a challenge, both theoretically and practically. In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. Our proposal is based on the smoothing of the top-K operator building on the flexible "perturbed optimizer" framework. We show that our loss function performs very well in the case of balanced datasets, while benefiting from a significantly lower computational time than the state-of-the-art top-K loss function. In addition, we propose a simple variant of our loss for the imbalanced case. Experiments on a heavy-tailed dataset show that our loss function significantly outperforms other baseline loss functions

    A comparative study of fine-grained classification methods in the context of the LifeCLEF plant identification challenge 2015

    Get PDF
    International audienceThis paper describes the participation of Inria to the plant identification task of the LifeCLEF 2015 challenge. The aim of the task was to produce a list of relevant species for a large set of plant observations related to 1000 species of trees, herbs and ferns living in Western Europe. Each plant observation contained several annotated pictures with organ/view tags: Flower, Leaf, Fruit, Stem, Branch, Entire, Scan (exclusively of leaf). To address this challenge, we experimented two popular families of classification techniques, i.e. convolutional neural networks (CNN) on one side and fisher vectors-based discriminant models on the other side. Our results show that the CNN approach achieves much better performance than the fisher vectors. Beyond, we show that the fusion of both techniques, based on a Bayesian inference using the confusion matrix of each classifier, did not improve the results of the CNN alone

    A Distributed Collaborative Filtering Algorithm Using Multiple Data Sources

    Get PDF
    International audienceCollaborative Filtering (CF) is one of the most commonly used recommendation methods. CF consists in predicting whether, or how much, a user will like (or dislike) an item by leveraging the knowledge of the user's preferences as well as that of other users. In practice, users interact and express their opinion on only a small subset of items, which makes the corresponding user-item rating matrix very sparse. Such data sparsity yields two main problems for recommender systems: (1) the lack of data to effectively model users' preferences, and (2) the lack of data to effectively model item characteristics. However, there are often many other data sources that are available to a recommender system provider, which can describe user interests and item characteristics (e.g., users' social network, tags associated to items, etc.). These valuable data sources may supply useful information to enhance a recommendation system in modeling users' preferences and item characteristics more accurately and thus, hopefully, to make recommenders more precise. For various reasons, these data sources may be managed by clusters of different data centers, thus requiring the development of distributed solutions. In this paper, we propose a new distributed collaborative filtering algorithm, which exploits and combines multiple and diverse data sources to improve recommendation quality. Our experimental evaluation using real datasets shows the effectiveness of our algorithm compared to state-of-the-art recommendation algorithms
    corecore